GS4: Generating Synthetic Samples for Semi-Supervised Nearest Neighbor Classification
نویسندگان
چکیده
In this paper, we propose a method to improve nearest neighbor classification accuracy under a semi-supervised setting. We call our approach GS4 (i.e., Generating Synthetic Samples Semi-Supervised). Existing self-training approaches classify unlabeled samples by exploiting local information. These samples are then incorporated into the training set of labeled data. However, errors are propagated and misclassifications at an early stage severely degrade the classification accuracy. To address this problem, the proposed method exploits the unlabeled data by using weights proportional to the classification confidence to generate synthetic samples. Specifically, our scheme is inspired by the Synthetic Minority Over-Sampling Technique. That is, each unlabeled sample is used to generate as many labeled samples as the number of classes represented by its k-nearest neighbors. In particular, the distance of each synthetic sample from its k-nearest neighbors of the same class is proportional to the classification confidence. As a result, the robustness to misclassification errors is increased and better accuracy is achieved. Experimental results using publicly available datasets demonstrate that statistically significant improvements are obtained when the proposed approach is employed.
منابع مشابه
Semi-supervised deep learning by metric embedding
Deep networks are successfully used as classification models yielding state-ofthe-art results when trained on a large number of labeled samples. These models, however, are usually much less suited for semi-supervised problems because of their tendency to overfit easily when trained on small amounts of data. In this work we will explore a new training objective that is targeting a semi-supervise...
متن کاملSemi-Supervised Spectral Mapping for Enhancing Separation between Classes
We present a spectral mapping technique for semisupervised pattern classification. Importance scores of features are firstly evaluated with a semi-supervised feature selection algorithm by Zhao et al. Training data are then embedded into a low-dimensional space with a spectral mapping derived from the selected and weighted feature vectors with which test data are classified by the nearest neigh...
متن کاملSemi-supervised multi-label image classification based on nearest neighbor editing
Semi-supervised multi-label classification has been applied to many real-world applications such as image classification, document classification and so on. In semi-supervised learning, unlabeled samples are added to the training set for enhancing the classification performance, however, noises are introduced simultaneously. In order to reduce this negative effect, the nearest neighbor data edi...
متن کاملImproved Nearest Neighbor Methods For Text Classification
We present new nearest neighbor methods for text classification and an evaluation of these methods against the existing nearest neighbor methods as well as other well-known text classification algorithms. Inspired by the language modeling approach to information retrieval, we show improvements in k-nearest neighbor (kNN) classification by replacing the classical cosine similarity with a KL dive...
متن کاملUsing the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data
The first step in graph-based semi-supervised classification is to construct a graph from input data. While the k-nearest neighbor graphs have been the de facto standard method of graph construction, this paper advocates using the less well-known mutual k-nearest neighbor graphs for high-dimensional natural language data. To compare the performance of these two graph construction methods, we ru...
متن کامل